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ABSTRACT 

Using a data base compiled by the Graduate School of 
Northwestern University, a longitudinal study of the graduate school 
careers of 2,211 students in 14 programs was conducted. Among the 
most prominent findings was the increase in the enrollment of foreign 
students. The patterns of attainment of graduate school milestones, 
such as Ph. D. candidacy and graduation were examined for each 
graduate program and for gender and ethnic groups. There was 
substantial variation across programs and, to a lesser degree, across 
demographic groups. Graduation rates for foreign students were higher 
than those for U.S. citizens. The association between the attainment 
of milestones and measures of academic potential, such as 
undergraduate grade point average (UGPA) and Graduate Record 
Examination (GRE) scores, was also investigated. The likelihood of 
attaining candidacy or of completing a doctorate was found to bear 
little relation to UGPA and GRE scores. This finding is probably a 
result of the use of UGPA and GRE in the selection of students into 
graduate programs. Appendix A presents 28 tables of ethnic and gender 
composition. Appendix B discusses survival analysis. Appendix C 
contains an empirical Bayes strategy for logistic regression. 
(Contains 20 figures, 7 tables in the text, and 22 references.) 
(Author/SLD) 
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Abstract 



Using a data base compiled by the Graduate School of Northwestern 
University, a longitudinal study of the graduate school careers of students 
in fourteen programs was conducted. Among the most prominent findings was 
the increase in the enrollment of foreign students. The patterns of 
attainment of graduate school milestones, such as Ph.D. candidacy and 
graduation, were examined for each graduate program and for gender and ethni 
groups. There was substantial variation across programs and, to a lesser 
degree, across demographic groups. Graduation rates for foreign students 
were higher than those for U.S. citizens. The association between the 
attainment of milestones and measures of academic potential, such as 
undergraduate grade -point average (UGPA) and Graduate Record Examination 
(GRE) scores was also investigated. The likelihood of attaining candidacy o 
of completing a doctorate was found to bear little relation to UGPA and GRE 
scores. This finding is probably a result of the use of UGPA and GRE in the 
selection of students into graduate programs. 



To many observers, graduate education in the United States is at a 
critical juncture. Recent studies show that American students represent a 
decreasing percentage of students enrolled in U.S. graduate schools. Another 
troublesome trend is the decrease in Black enrollment during the last decade 
(Brown, 1987; Trent & Copeland, 1987). There has been a growing concern that 
talented undergraduates may be choosing to go to professional schools or 
turning immediately to the world of work upon graduation (see Hartnett, 
1987). Undoubtedly there is a complex web of causes underlying these 
patterns, among them the financial burdens education imposes, perceived job 
opportunities and the vicissitudes of fashion. Graduate school deans are now 
faced with the challenge of analyzing these trends and developing appropriate 
policies. 

It is important, therefore, to determine what happens to those 
individuals who actually enroll in graduate school. At what pace do these 
students reach milestones in their graduate careers, such as advancement to 
candidacy and attainment of the Ph.D. degree? What attributes differentiate 
students who complete the doctorate from those who do not? How do the- 
patterns of achievement differ across academic programs? Answers to *-.hese 
questions about pathways through graduate school can provide information that 
will be useful to graduate school policymakers in allocating resources and 
improving educational practices. 

To investigate these issues, we used a unique data, base from the 
Graduate School of Northwestern University that can support longitudinal 
cohort analyses of graduate school careers. This date base contains the 
records of applicants 'co the Graduate School over a period of nearly fifteen 
years . 

Our research focused on fourteen graduate programs: Chemical 
Engineering, Computer Science, Chemistry, Mathematics, Physics, Counseling 
Psychology, Clinical Psychology, Sociology, Theatre, English, History, 
Political Science, Economics, and Philosophy. Chemical Engineering and 
Computer Science are part of the Technological Institute, Counseling 
Psychology is part of the School of Education, Clinical Psychology is part of 
Clinical Medicine, and Theatre is part of the School of Speech. The 
remaining programs come under the rubric of Arts and Sciences. These 
programs were selected because they are of general interest, their sample 
sizes are adequate, and they are thought to be relatively free of major 
administrative shifts during the time period in question. Only students who 
stated at entry that they were seeking a Ph.D. were included in the study. 

Research Questions 



Our research questions fell into two broad categories: 

1. How do the patterns of attainment of graduate school milestones, such as 
Ph.D. candidacy and graduation, differ across academic disciplines and across 
demographic subgroups? 



2. What is the association between students' attainment of milestones in 
their graduate careers and measures of their academic potential, such as 
undergraduate grade-point average (UGPA) and Graduate Record Examination 
(GRE) scores? 

Because our study focuses on a single school only, our findings cannot be 
assumed to have broad applicability. Rather, our research illustrates the 
kinds of analyses that may be useful in addressing policy decisions about 
enrollment, retention, and academic policy in graduate schools. 

Data Analysis 

Introduction 

Our data analyses are of three basic types. First, descriptive analyses 
were conducted, showing the numbers of students entering each of the 14 
graduate programs, the proportions of women, minorities, and foreign 
students, and the candidacy and graduation rates for various groups. The 
second category of analyses involves the examination of patterns of 
attainment of graduate school milestones for each of the graduate programs. 
The final phase of analysis involves investigation of the association between 
attainment of milestones and potential explanatory variables. Although the 
data base included students who entered between 1972 and 1986, some analyses 
were based on only a subset of these students. Details are given in the 
following sections. 

Descriptive Analyses 

Tables 1 and 2 provide information about the demographic makeup of 
Northwe c tern students in the 14 selected graduate programs for the entering 
classes of 1975 through 1986. (Students who entered during the years 1972- 
1974 were not included because accurate ^mographic information was 
unavailable.) Entry years have been grouped into four sets of three years. 
Table 1 provides ethnic information and Table 2 gives the proportions of male 
and female students for the 14 graduate programs combined. Tables A1-A28 
(Appendix A) provide corresponding information for each of the 14 programs. 

In the Northwestern data base, students are assigned to one of the 
following categories, usually on the basis of self -report: 

1 American Indian 

2 Black 

3 Oriental 

4 Hispanic 

5 Foreign 

6 White 

7 Mexican-American/Chicano 

8 Puerto Rican 



Note that no ethnic information is available for foreign students. In the 



IB 



-3- 



<3 T3 
0) 



jO 
E 

vO O 
00 U 

on 

• e 

in eg 



U 

o 



J-- a. 
o 

c « 

o 3 
o 



o 



u < 



•MOT 


























... 




























1 
















u 




>e « 




— « 


in O r\J 




o 


o 




o 


N- O 


<r 


M 




o o 






moo 


<f 


o 


in 


^ O r- 


>o o 




II 




o o 




►H 


<o ■ ■ 


in 


t 


■ 


N* 


• • 


m • 


■ 


11 


r\j 


• • 




C* 
1-4 
I 


O O 




o 






o o 


o 




II 


rg O O 




O <Nl 




o 
r- 






O nj 


o 


r\j 


II 
II 




o o 




1 


















II 





























































1 






*t 


>< 










II 








CM 


K> IT* Kl 


o 






fNJ 


in r- 


o o 


o 


II 


^> 


^— o 




23 


T— «*\J «-» 




NO 






^ o 


o 


o 


11 


rsj 


>o o 


or 


v-« 


















II 




• • 


UJ 


c« 


N- 1- 




*— 






O 


o 


o 


II 




in o 


X 


iH 


















II 




O 


t— 


-4 


















11 






o 


X! 


















II 


































1 






*e 










*r 


11 




>^ 


*z 


t 


CJ OO 




in 


o 


OJ 




o 




II 


O 


00 O 




1 


O ^ N- 


O 






o 


03 f> 


N. in 


00 


11 


CO 


cx o 




( 


r— • • 




• 


• 




■ ■ 




0 


U 




• • 


UJ 


I 


in O 




OS *~ 








o 


II 






QC 


{ 


fvl 










nj ra 


fn ►n 


11 




CM O 


O 


f 


















II 




r- 


UL 


I 


















tl 
































o 


















>+ 


II 








1 


OvOCN 


>o 


o 




o *o 




<r 


II 


m c0 O 




* 


<— O C\J 




T— 








O 


t— 


II 








1 


















II 




• • 




1 












O «- 






If 




*~ o 


</> 




in 
















It 




o 




{ 


















It 




IT" 


X 


1 


















II 








































;* 


>* 








X 


1! 




^< 




1 


«n no <\i 




as 






<t o 




o 


11 






1 






«\* 






O in 




in 


II 






z 


t 


















11 




• • 




I 


O O 




V" 










N- 


II 




CM O 




1 


















II 




O 




t 


















II 




t— 


< 




















M 




















J? >* 






tl 










ro >o in 


O 


o 




in co 




in. 


11 


in 


^ o 






r\j ^ 00 










rn O 






II 


so 


o o 






• • 




• 


• 




■ ■ 


• 


• 


tl 




• * 


o 










O 










tl 




CVJ o 


«* 




m 










rvj 






II 




o 


— I 




















tt 




T— 


CO 




















11 

































LTX r— O 
O K> r- 

m ■ t 

o r^- 



5? ?r 
ao a o 



in f> f> 
O K) 
fn • • 

no O 
O cNI 



m r- ro 

m ^ *>r 

rO • • 

CM >J- 

sa ni 



2! ^ 
3 -J 
O O 

DC O 



I 

tn 



of 

or —I 
O O 



a o 
or O 



o o 
a: o 



UJ 



o 

00 
! 

CO 

> 



>- 

ui 

or 



i 

o 



OC 



CO 

I 

oo 

o 



o 



Osl O 

m o 

•>r • • 
r 

NO O 



-x —I 

o o 

or O 



o 



• 11 
. ft 

u 

H 
It 
It 
tt 
It 
II 
II 
. »l 

• II 
II 
II 
It 
II 
it 
II 
II 
II 

. II 
11 
II 
II 
II 
II 
JJ 
II 
II 

• |1 
II 
11 
It 
II 
it 
II 
II 
II 
II 
II 
II 
il 
tl 
II 
li 

II 
II 
II 
II 
II 

M 
U 
tl 
1 1 
II 
tl 
II 
II 
It 
'I 
11 
I 

11 
H 
II 
II 
H 
It 
II 
(i 
l» 
II 
II 
II 
II 
M 
II 
it 
II 
M 
II 
i| 
H 
II 
11 
il 
't 
II 
II 
It 
U 



ZD 



5 

■J) 



-a 



CN 



C 



2 












6 












o 


X 




CJ 


< 


u 






> 


o 






















w 






























































0 




CJ 




u 















ERLC 



-4- 



CM 







1 «— 




















































1 


1 












II 






il 






1 


X 


t/% CD ^NJ 


i^ 


CD "O 


srt /"~^ fSw 


N» O ^ 


ti 




o o 


H 






• 


<J 


ITi O O 


<f 


O IA 


*^ O ^* 


v/N 


ji 






II 








►—i 


■>o ■ * 






■ ■ 


■ ■ 


II 




■ ■ 


If 






J 


04 


CD 




< -J 


CD CD 


CD 1*0 


j} 


r\j 


O O 


It 








►H 


CD CM 






CD f\i 


CD rvj 


tl 




o o 


11 








1 










T— 


ii 






It 






1 


t 












it 






II 
































| — — 






















u 




1 


1 

1 












it 
ii 




^« 


If 


>— 




1 o 


o no 




00 OO 


o o o 


o o o 


ii 




N- O 


it 
ii 


- 

1-1 
u 




1 z 


od 


r- 




T— 00 


o o 


o o 


•1 
ii 




fv. CD 


II 




1 M 


UJt 


■ ■ 




■ ■ 


■ ■ 




if 






ti 
it 






1 


a 






O iTS 


o o 


O Q 


n 






jl 
II 


ui 




1 ^ 














t! 










1 < 














H 






II 


L986 


1 X 














11 






II 




( 


i 








«X >/ 


>* >f 


f! 






11 






1 


l 


CO CO o 




CM O 


n"* C\i K> 


CD MD 


11 






II 




Con 


1 LL1 


1 


r N N 




f— CO 


M CO N 


O* tfy N> 


ii 


*~ 




II 


r-> 


1 —J 


i 


04 ■ ■ 




■ ■ 


y— ■ • 


r - • ■ 


II 




• ■ 


(1 


CTv 


I <? 




O 




HI 


O oQ 




If 




INI O 


It 


r— I 


y] 


I r 


! 








r\L 




1) 






11 




E 


1 UJ 














it 
ii 






II 


O 


C3 
\-t 


I u. 














II 






if 




CO 


1 — 






















uou; 


O 


1 










^? 




il 




]l 




1 




t— p»* 






oO o*^ 


I s - o 






OJ CD 


tt 




1 


1 


(\J N fO 






O 


>^ 


ii 
11 




CD 


it 


o 




1 


i 


■ ■ 




■ ■ 


■ ■ 


fO ■ ■ 


11 






II 


3 


1 UJ 


I 






CO *A 


CD t~ 




ii 
* • 


*^ 




II 






i -J 














|| 




O O 


II 






1 < 














It 






It 




;ad 


1 X 














It 






il 










Z 




^< 


•y 


2! «V 


|| 






H 


s: 




I 


3 -J 






3 — } 


~W | 

-H —J 


|f 




3 -J 


II 






1 


o o 




o o 


o o 




{| 




O O 


II 




r— 1 












!V r ^ 


Oi 


|| 




cc o 


II 


c 


















II 
tl 

It 






It 
II 




< 




! 












II 






II 




















{1 






11 




















II 






it 




















II 






<t 












O 






vO 


M 






11 


c 










CO 






00 


u 






M 


a 








i 


1 






1 


II 






11 












CO 




T~ 


<r 


II 






H 










o 

v* 


fv- 

T~ 




o 


00 
o- 


II 
II 
II 
11 






(I 
il 
11 
it 












QC 






rr 


il 


—J 




II 










5- 


>- 




>• 


>- 


'1 


<x 




M 












O 






O 


11 


h- 




il 












lU 








ri 


O 




11 










ft: 


a: 




or 


or 


tt 
ii 


h- 




11 
If 



ERIC 



-5- 



present study, ethnic categories have been grouped because of small sample 
sizes. In Tables 1 and 2, which combine information across the 14 programs, 
the "Asian" heading is a relabeling of category 3, the "Hispanic" heading 
includes categories 4, 7, and 8, and the "Other and missing" heading includes 
category 1, as well as those who are missing ethnic information. In the 
tables for individual programs (A1-A28), further collapsing of ethnic 
categories was necessary. Information is provided for Whites, Blacks, and 
foreign students; all other categories are included under the "Other and 
missing" heading. In interpreting Tables 1-2 and A1-A28, it is important to 
know that for the earlier years of data (through approximately 1976), 
Northwestern sometimes omitted ethnic codes for Whites. It is not possible 
to distinguish these White students from students for whom ethnic codes were 
omitted for other reasons. This explains the higher percentage of missing 
data and the lower percentage of Whites in the earlier years. 

The most striking aspect of the information in Tables 1-2 and A1-A28 is 
the increase in the percentage of foreign students in most programs. 
Overall, the percentage of entering Ph . D . - seekers who were foreign increased 
from 15% to 32%. The most dramatic changes were the increases in Computer 
Science, from 28% foreign students in 1975-1977 to 62% in 1984-1986, and in 
Physics, from 29% to 60%. (The large percentage change in Thea tre , ' f rom 4% 
to 40%, is less noteworthy because of the small number of students.) 
Increases in the percentage of foreign students have been evident in other 
studies as well (e.g., National Research Council, 1986; Trent and Copeland, 



The percent of Black enrollees dropped from 3.3 to 1.6; the percent of 
Hispanics dropped from 3 to 1 . The percent of Asians was less than 1 in 
1975-77, reached 4 in 1981-1983, and dropped to 3 in 1984-1986. (Note that 
these percentages of minority enrollment do not include foreign students.) 

Combined across programs, the 2:1 ratio of men to women has remained 
quite steady, although the ratio of men to women varies considerably across 
programs. The most significant within-program changes over time were the 
increases in the proportion of women in Clinical Psychology and Counseling 
Psychology. There was a large decrease in the proportion of women in 
Theatre . 

Tables 3 through 6 provide three types of information about Ph.D.- 
seeking students: (1) the percentage of students who attained candidacy by 
the end of the data collection in May, 1987 (2) the percentage of those 
attaining candidacy who also graduated and (3) the overall percentage of 
students who graduated. Admission to Ph.D. candidacy at Northwestern is 
contingent on completion of departmental requirements, including a 
comprehensive qualifying examination, and on the approval of the Graduate 
Faculty . 

Northwestern' s Graduate School has regulations concerning the amount of 
time permitted for achieving candidacy and graduation. These official 
timetables must be considered in interpreting our findings, although, 
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Table 3 

Percents of 1972 - 1978 Entrants 
Attaining Graduate School Milestones by May, 1987 
Results for the 14 Graduate Programs 



Program 

Counseling Psychology 

Chemistry 

English 

History 

Mathematics 

Political Science 

Chemical Engineering 

Clinical Psychology 

Economics 

Philosophy 

Physics 

Sociology 

Theatre 

Computer Science 
Total 



Sample 
Size 

126 
193 
100 
80 
62 
104 
80 
63 
148 
50 
78 
91 
53 
151 

1379 



Candidacy (%) 



Graduation, 
given Candidacy (%) 



Graduation (%) 



73 


82 


60 


83 


96 


80 


46 


72 


33 


61 


67 


41 


48 


73 


36 


51 


77 


39 


59 


96 


56 


84 


85 


71 


54 


85 


46 


62 


58 


36 


58 


96 


55 


67 


82 


55 


25 


46 


■ 11 


31 


83 


26 


59 


83 


49 
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Table 4 

Percents of 1975-1978 Entrants Who 
Attained Candidacy By May 1987: 
Results for Gender and Ethnic Groups 



White 



Black 



Foreign 



Other agd 
Missing 



Total 



Male 



64% 
(380) 



50% 
(6) 



62% 
(111) 



22% 
(78) 



58% 
(575) 



Female 



64% 
(188) 



68% 
(19> 



65% 
(26) 



23% 
(52) 



57% 
(285) 



Missing Gender 



(0) 



(0) 



(0) 



6% 
(17) 



6% 
(17) 



Total 



64% 

(568) 



64% 

(25) 



63% 
(137) 



20% 
(147) 



56% 
(877) 



Sample sizes are shown in parentheses 



b... 



Hispanics, Asians, and Native Americans are included in this category, 
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Table 5 

Percents of 1975-1978 Entrants Who 
Graduated by May, 1987, Given 
That They Achieved Candidacy: 
Results for Gender and Ethnic Groups 



Other agd 

White Black Foreign Missing Total 

Male 81% 67% 86% 82% 82% 

(242) (3) (69) (17) (331) 



Female 72% 77% 100% 83% 76% 

(121) (13) ( 17) (12) (163) 



Missing Gender — 0% 0% 

(0) (0) (0) (1) (1) 

Total 78% 75% 88% 80% 80% 

(363) (16) (86) (30) (495) 



Sample sizes are shown in parentheses. 

Hispanics, Asians, and Native Americans are included in this category. 
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Table 6 

Percents of 1975-1978 Entrants Who 
Graduated by May, 1987: 
Results for Gender and Ethnic Groups 



White 



Black Foreign 



Other agd 
Missing 



Total 



Male 



52% 
(380) 



33% 
(6) 



53% 
(111) 



18% 

(78) 



47% 
(575) 



Female 



46% 
(188) 



53% 
(19) 



65% 
(26) 



19% 
(52) 



44% 

(285) 



Missing Gender 



(0) 



(0) 



(0) 



0% 
(17) 



0% 
(17) 



Total 



50% 
(568) 



48% 
(25) 



55% 
(137) 



16% 
(147) 



45% 
(877) 



Sample sizes are shown in parentheses. 

Hispanics, Asians, and Native Americans are included in this category. 
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according to our results, they may not always have been followed. According 
to the 1985-1986 Northwestern catalog, 

a student is expected to be admitted to candidacy before 
the end of the third calendar year after initial 
registration in the Graduate School at Northwestern 
university; a student must be admitted to candidacy by 
the end of the twelfth quarter after initial 
registration. . . (Northwestern University, 1985, p. 34) . 

All requirements for the doctoral degree must be met 
within five years of admission to candidacy, or within 
eight years of the last year of consecutive full-time 
residency, to be calculated from the beginning of that 
year, or within ten years of the initial registration in 
the Graduate School, whichever comes first . . .A student 
may petition for a [two-year] extension of the 
deadline... There is no extension beyond two years 
(p. 33). 

Table 3 gives candidacy and graduation information by graduate program 
for students who entered during the years 1972-1978. Collapsing across 
programs, the rates for candidacy; graduation, given candidacy; and 
graduation were 59%, 83%, and 49%, respectively. The highest candidacy and 
graduation rates were in Clinical Psychology (84% candidacy, 71% graduation) 
and Chemistry (83% candidacy, 80% graduation), while the lowest rates were in 
Theatre (25% candidacy, 11% graduation) and Computer Science (31% candidacy, 
26% graduation). The hignest rates for graduation, given that candidacy had 
been attained, were in Chemistry, Chemical Engineering, and Physics (96% in 
each case); the lowest were in Theatre (46%) and Philosophy (58%). 
Northwestern staff have informed us that in some programs, students who are 
in reality seeking only a master's degree may state that they are seeking a 
Ph.D. in order to make themselves eligible for certain types of financial 
aid. This may, in part, explain the low rates of attainment in Computer 
Science and Theatre. 

Tables 4 through 6 give information for ethnic and gender groups, 
combined across the 14 graduate programs, for students who entered during the 
years 1975-1978. (The ethnic categories are defined as in Tables A1-A28.) 
Collapsing across groups, the rates of candidacy; graduation, given 
candidacy; and graduation were 56%, 80%, and 45%, respectively. These rates 
differ slightly from those reported above, which were based on students who 
entered from 1972-1978. As shown in Table 4, candidacy rates for Whites, 
Blacks, and foreign students were nearly identical (63% to 64%). Only for 
Black students was there a substantial difference in the candidacy rates for 
males (50%) and females (68%), but because of the small sample sizes (n ~ 6 
for Black males), this finding should not be given too much weight. As 
indicated in Table 5, the rate of graduation, given candidacy, was higher for 
foreign students (88%) than for Whites (78%) and Blacks (75%). Among Whites, 
the rate was higher for men (81%) than for women (72%). Among foreign 
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students and Blacks, however, the rate was higher for women. (Again, note 
the small sample size for Blacks.) All 17 female foreign students who 
achieved candidacy also graduated. The pattern of ethnic and gender 
differences was the same for the graduation rates, shown in Table 6. The 
rate for foreign students (55%) exceeded rates for Whites (50%) and Blacks 
(48%). Among Whites, the rate was higher for men, whereas among Black and 
foreign students, the rates were higher for women. 

There are several possible reasons for the higher rates of graduation 
for foreign students. Foreign students are likely to have been selected to 
study in the United States because of their academic excellence. Also, as 
Girves and Wemmerus (in press, p. 10) pointed out, "the fact that foreign 
students must be enrolled full-time and must demonstrate sufficient financial 
support to carry out their degree programs may be more incentive for them to 
complete their degrees. Domestic students, on the other hand, do not 
necessarily have these incentives, and may have other options outside of 
graduate school." 

The very low candidacy and graduation rates for students who were 
missing gender or ethnic information (see Tables 4 and 6) is somewhat 
mysterious. (Although the "Other and missing" ethnic category includes 
Asians, Hispanics, and Native Americans, about 80% of the students in this 
category were, in fact, missing ethnic information.) It could be that when 
students drop out, there is less opportunity for university personnel to fill 
in missing information on gender or ethnicity, thus creating an association 
between low attainment and the absence of these data. Because of the coding 
practices mentioned earlier, it is likely that a large proportion of students 
who are missing ethnic codes are, in fact, White. If all the students who 
were missing ethnic data were White, the rates of candidacy and graduation 
for Whites would be roughly 6 to 8 percentage points lower. The rates for 
Whites given in Tables 4 and 6 can be viewed as upper bounds on the actual 
rates , 

Patterns of Attainment of Graduate School Milestones 

The rates of attainment given in Tables 4-6 can provide only limited 
information about patterns of candidacy and graduation. A more detailed 
picture, based on all entering students, rather than 1972-1978 entrants only, 
can be achieved through survival analysis, a method often used in 
biostatistical applications (Kalbfleisch and Prentice, 1980). In survival 
analysis, we are interested in the survival function, which is the 
probability that an event will take more than x units of time to occur. In 
this study, the events of interest are the graduate school milestones, 
graduation and candidacy, and the units of time are graduate school years. 
The survival function, S(x), is defined as follows: 

S(x) - P(X > x) ~ 1 - F(x) , 

where X is the time elapsed until the milestone is reached and F(x) is the 
cumulative distribution function of X. A related function is the hazard 
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function, h(x) , which is the instantaneous risk of the occurrence of an event 
at X - x, given that the event has not occurred before time x. The hazard 
function is defined as 

h(x) - f(x) / S(x) , 

where f(x) is the probability density function of X. The hazard and survival 
functions are equivalent ways of summarizing the distribution of survival 
times, since 

h(x) - - d/dx [In (S(x))] . 

If X is assumed to have the exponential distribution with parameter 0, h(x) 
is constant and equal to 9 . 

It is important to note that the statistical terms , "survival , " 
"hazard," and "risk," are used here in a way chat differs from everyday 
parlance. In our report, survival refers to the probability of remaining in 
graduate school without achieving the event of interest; for example, the 
probability that the degree is not received by a particular time. Similarly, 
we speak of the "hazards" or "risks" of attaining candidacy or completing a 
degree. (Some may find this usage to be counterintuitive in the present 
context; others may find it appropriate!) 

Standard methods exist for both nonparametric and parametric estimation 
of survival functions (e.g., see Kalbfleisch and Prentice, 1980). 
Difficulties in estimation can occur when sample sizes are small, however, 
particularly when it is of interest to estimate separate curves for 
subpopulations . Bayesian methods can yield more stable estimates by 
incorporating prior distributions for model parameters. Whereas previous 
Bayesian efforts have focused on the estimation of a single survival curve, 
Braun (1985) developed an empirical Bayes (EB) approach for estimating a 
family of survival functions. Details of the model and the estimation 
procedures are provided in Appendix B. A general description of EB methods 
is given in Braun (in press). 

Three types of survival analyses were conducted. The first two types 
pertained to the achievement of candidacy and graduation, respectively. The 
third type of analysis involved examination of the attainment of the Ph.D. 
degree, given that candidacy had been reached. For each of the three types 
of analysis, graphs of the EB estimates of the hazard and survival functions 
are provided. 

To facilitate interpretation of the survival analysis graphs, the 14 
selected graduate programs have been grouped as follows: Group I consists of 
the two programs that are part of the Technological Institute, Chemical 
Engineering and Computer Science, and the three most technical of the Arts 
and Sciences programs, Chemistry, Mathematics, and Physics. Group II 
consists of the three behavioral science programs, Counseling Psychology, 
Clinical Psychology, and Sociology, as well as the Theatre program, and Group 
III includes the remaining Arts and Sciences programs, English, History, 
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Political Science, Economics, and Philosophy. (Additional analyses of 
candidacy and graduation were conducted in which only White Americans were 
included. The EB analysis produced results similar to those obtained for the 
total group of students; the classical analysis produced more unstable 
results because of the smaller sample sizes.) 

A phenomenon that had to be considered in analyzing these data is 
censoring: the removal of individuals from the risk set (the group of 
individuals who are available to experience the event of interest) for 
reasons other than the occurrence of the event. In this study, some 
individuals were censored because the data collection effort ended during 
their graduate careers. In the survival model applied here, censoring is 
accommodated through adjustment of the risk set. This means that if the 
termination of the data collection effort occurs at time x of a student's 
graduate career, that student will no longer be considered "exposed" or "at 
risk" for candidacy or graduation after time x. Note that, for purposes of 
our analyses, students who left graduate school without a Ph.D. are still 
considered to be part of the risk set. Roughly speaking, our analyses 
focused on the probabilities of achieving milestones in year x for those who 
entered school x years earlier. If it he'd been possible to obtain accurate 
information about student drop-out, the students who left school without 
attaining milestones could have been deleted from the risk set. This type of 
analysis, however, would have had a different interpretation.- It would have 
involved estimation of the probabilities of attaining milestones by year x 
for those students still in school x years after entry . In an analysis of 
this kind, the attainment of milestones would have appeared more likely. 

The results of the survival analyses for the 14 selected graduate 
programs are given in Figures 1-18. The initial sample sizes for the 14 
graduate programs ranged from 76 to 414 for the unconditional analyses of 
candidacy and graduation and from 25 (for Theatre) to 281 for the analysis of 
graduation, given candidacy. (In survival analyses, the size of the risk 
set decreases as more people attain the event. Therefore, estimates of 
hazard and survival functions for later time periods are based on fewer cases 
and are less precise than those for earlier time periods.) Note that, for 
each function, the vertical and horizontal scales of the graphs are the same 
within each of the three types of analysis, but they differ somewhat across 
analysis types. 

Figures 1-6 give the results of the candidacy analyses. For each of the 
three groups of programs, the graph of the estimated hazard functions appears 
first, followed by the graph of the estimated survival function. The hazard 
function at time x can be interpreted as the instantaneous "risk" that the 
event (candidacy or graduation) occurs at time x, given that it has not 
occurred prior to time x. The survival function at time x is the probability 
that the event has not occurred by time x. If the hazard function takes on a 
high value at time x, the survival function will show a corre: pondingly large 
drop at time x. 
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One interesting aspect of the can lidacy analyses in Figures 1-6 is that 
results were similar among the programs in Group III, but not among the 
programs in Group I and II. In Group III (Figure 5), all five hazard 
functions rose to a sharp peak at year 3, and then declined, indicating that 
the third year of graduate school was the most likely time for the occurrence 
of candidacy in these programs. Correspondingly, the survival functions 
(Figure 6) showed a steep drop until year 4, and then started to level out. 
For four of the programs, the probability that candidacy is not achieved 
dropped to below .50 by year 5 and remained about the same through year 14. 
(English fell to about .55.) Apparently, if candidacy is not achieved by 
year 4, it is unlikely to be achieved. The hazard functions for Groups I and 

11 (Figures 1 and 3) showed peaks in years 2-4, but there was substantial 
variation among prjgrams in the shapes of the hazard functions. The survival 
functions (Figures 2 and 4) leveled out by about the fifth year, but the 
values they eventually reached varied widely, from about .65 for Theatre and 
Computer Science to slightly more than .20 for Clinical Psychology and 
Chemistry. 

In the analyses of graduation, which are displayed in Figures 7-12, 
there was again greater similarity among the Group III programs than among 
Groups I and II. The hazards for Group III (Figure 11) peaked at year 5 
except for History, which peaked at years 6 and 10. The survival functions 
(Figure 12) leveled out between years 10 and 12, reaching values between 
about .70 for Philosophy and .55 for Economics. As in the candidacy 
analyses, Groups I and II displayed considerably more variation. Chemistry 
and Chemical Engineering showed significant peaks in the hazard functions at 
year 5 (Figure 7) as did Clinical Psychology at year 6 (Figure 9) , 
corresponding to steep drops in the survival functions (Figures 8 and 10). 
For Group I, the survival functions (Figure 8) leveled off by year 8, at 
values ranging from about .70 for Computer Science to about .30 for 
Chemistry. The Group II survival functions (Figure 10) leveled off by year 

12 at values ranging from about .80 for Theatre to about .35 for Clinical 
Psychology . 

Figures 13-18 show the results of the survival analysis for graduation, 
given that candidacy has occurred. In these figures, the x-axis represents 
years since the attainment of candidacy, rather than years since entry to 
graduate school. In these analyses, the Group I programs were the closest 
together and also showed the steepest drops. As of the sixth year after 
candidacy, the values of the survival function (Figure 14) ranged from about 
.25 for Math (representing a probability of .75 of completing a degree by 
this point for those who achieved candidacy) to slightly below .10 for 
Chemistry and Chemical Engineering. For Groups II and III (Figures 16 and 
18), the programs with the highest estimated survival probabilities were 
Theatre, History and Philosophy (all about .50); the program with the lowest 
value was Clinical Psychology (.10). 

It is hoped that the analyses of the type displayed in Figures 1-18 can 
be useful to graduate school deans in estimating the number of graduates an 
entering class is lik< ly to yield and in determining whether administrative 
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changes are needed to hasten progress in some graduate programs. Survival 
analysis allows examination of candidacy and graduation rates at multiple 
time points and thus gives a more detailed picture of milestone attainment 
than simple rates of candidacy or graduation. For example, Figure 12 allows 
us to state that, of an entering class of 10 students in Political Science or 
Economics at Northwestern , one student would be expected to receive the 
doctorate by four years after entry. Analyses of the kind illustrated in 
Figures 13-18, which show the rate at which Ph.D. candidates complete their 
degrees, should be particularly useful to policymakers in targeting programs 
for administrative review. 



Relation of C andidacy and Graduation to Measures of Academic Potential 

Our original intention was to use logistic regression analysis (see 
Hanushek and Jackson, 1977) to model the relation between milestone 
attainment and such explanatory variables as undergraduate grade-point 
average (UGPA) , GRE verbal score (GREV) , and GRE quantitative score (GREQ) . A 
possible EB strategy for logistic regression, which takes advantage of 
existing EB methods for the normal case, is outlined in Appendix C. 

However, preliminary examination of the data revealed that GRE scores 
and UGPA were almost entirely unrelat-ed to the achievement of candidacy and 
graduation. The candidacy and graduation variables were defined as follows: 
Individuals received a code of one if they attained the milestone by August, 
1986 and a code of zero otherwise. (That is, both dropouts and those who 
remained in school without attaining the milestone received a code of zero.) 
Only students who entered between 1972 and 1978 were included in the 
analysis. GRE scores were available for 76% of these students overall; 
percents ranged from 48 to 94 across graduate programs. UGPA was available 
for 84%, with percents ranging from 58 to 95. Means and standard deviations 
of GRE scores and UGPA for the 14 graduate programs are given in Table 7, 
along with the percent of 1972-1978 entrants for which predictor information 
was available. (Scores were not available for the GRE analytical measure, 
which was first administered in its present form in 1981.) 
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Graphical displays of the correlations of the candidacy and graduation 
variables with GRE scores and UGPA are given in Figures 19-20. Figure 19 
shows the point-biserial correlations between the candidacy indicator 
variable and GREV, GREQ, and UGPA. (A point-biserial correlation is a 
Pearson correlation between a dichotomous variable and a continuous variable; 
see, e.g., McNemar, 1962.) The left-most column lists the intervals for 
values of the correlation coefficients. The next column shows, for each 
interval, two-letter codes for the graduate programs for which the 
correlation between GREV and candidacy fell in that interval. (Graduate 
programs are listed alphabetically within intervals.) The next two columns 
give the analogous information for the correlations of candidacy with GREQ 
and UGPA, respectively. Figure 20 shows the corresponding correlations for 
the graduation indicator variable. The sample sizes on which these 
correlations are based ranged from 43 to 172. For some students, information 
was available for some preadmissions measures, but not others. Typically, 
correlations involving GREV and GREQ were based on identical or nearly 

41 
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Table 7 

Descriptive Statistics for Predictor Variables 
(1972-1978 Entrants) 



Graduate 

Program N 

Counseling Psychology 126 

Chemistry 193 

English 100 

History 80 

Math 62 

Political Science 104 

Chemical Engineering 80 

Clinical Psychology 63 

Economics 148 

Philosophy 50 

Physics 78 

Sociology 91 

Theatre 53 

Computer Science 151 



Graduate Record Examination 
Verbal Quantitative 



Mean SD 
599 93 



584 
712 



650 
623 
697 



86 
80 



643 107 

593 141 

584 117 

460 107 



81 
114 

69 



553 143 

612 113 

604 104 

523 140 



Mean SD 
543 106 



714 
583 



588 
695 
636 
702 
666 
722 



69 
108 



556 127 
731 83 



115 

92 
93 
76 
89 
64 



584 123 
547 109 



% 



706 



79 



48 
84 
94 
86 
74 
79 
65 
83 
81 
94 
76 
78 
81 
58 



Undergraduate GPA 
M ean SD % a 

94 



3.21 
3.49 
3.56 
3.58 
3.62 
3.49 
3.39 
3.53 
3.50 
3.66 
3.32 
3.50 
3.46 
3.4 7 



.41 

.34 89 

.35 92 

.38 91 

.34 84 

.37 78 

.41 59 

.39 95 

.40 74 

.31 90 

.41 58 

.34 86 

.33 91 

.39 75 



3 

Percent of 1972-1978 entrants for which predictor information was available, 



ERIC 



-35- 

Figure 19 



Point-Biserial Correlations of Measures of Academic Attainment 
with Candidacy Indicator Variable 
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The indicator variable equals one if candidacy was achieved by 
August, 1986 and zero otherwise. Only students who entered between 
1972 and 1978 were included in the analysis. 
Sample sizes range from 43 to 172. 

Programs are listed alphabetically within intervals. 
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Figure 20 



Point-Biserial Correlations of Measures of Academic Attainment 
with Graduation Indicator Variable 
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August, 1986 and zero otherwise. Only students who entered between 
1972 and 1978 were included in the ai.alysis. 
Sample sizes range from 43 to 172. 

Programs are listed alphabetically within intervals. 
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identical groups of students; analyses involving UGPA were based on a 
slightly different, but overlapping, group. 

For the candidacy variable, correlations ranged from -.19 to .29; for 
graduation, they ranged from -.23 to .37. The size of the correlations for 
GREV and GREQ seemed to be unrelated to the degree of quantitative emphasis 
in the graduate programs. For each of the three measures of academic 
potential, the ordering of the correlations in the graduation analysis 
roughly paralleled that obtained in the candidacy analysis. There were four 
programs Chemical Engineering, Clinical Psychology, English, and Math -- 
in which at least five out of the six correlations displayed in Figures 19-20 
were positive and one program History in which all six correlations 
were negative. The most striking aspect of these results, however, is that, 
as shown in the last row of Figures 19 and 20, the medians for all six types 
of correlations were close to zero. Correlations of GREV, GREQ, and UGPA 
with reciprocal time to candidacy and reciprocal time to degree also tended 
to be very low, as did correlations of GRE advanced test scores with 
candidacy and graduation. It was hypothesized that GRE scores and UGPA might 
be more successful as predictors of graduation, given that candidacy had been 
achieved. Therefore, correlations of GREV, GREQ, and UGPA with graduation 
were computed for only those students who had achieved candidacy; these 
correlations, too, had medians close to zero. Finally, analyses were 
repeated for White Americans only, again producing similar results. 

To facilitate further exploration of the interr "ationships between 
measures of academic potential and the attainment of graduate school 
milestones, a listing of several key variables was obtained for those who had 
matriculated in any of the following eight programs during the years 1972 to 
1978: Chemical Engineering, Chemistry, Mathematics, Counseling Psychology, 
Clinical Psychology, English, History, and Political Science. (The earlier 
phases of this study included these eight programs only.) The following 
variables were listed: sex, ethnicity. UGPA, GREV, GREQ, a weighted sum of 
UGPA, GREV, and GREQ, the candidacy and graduation indicator variables, and 
the number of years to completion of the Ph.D., where applicable. These data 
were examined in detail and were tabulated in various ways. For example, 
stem-and-leaf diagrams of UGPA, GREV, GREQ, and the composite variable were 
created for those who had and had not achieved candidacy and graduation. 
Males, females, Blacks, and Whites were examined separately. These 
painstaking analyses were intended to reveal any patterns that had might have 
gone undetected in more conventional analyses. However, no such patterns 
were found. 



In typical validity studies of the GRE, researchers examine the 
correlation of GRE scores with grade-point average for the first year of 
graduate school (see Burton & Turner, 1983). First-year GPA was not 
available in the Northwestern data base. For students who completed 
doctorates, we examined the correlations of final GPA with GRE scores and 
UGPA. Unfortunately, the number of graduates within each program for whom 
predictor information was available was very small. Therefore, the sample 
sizes for these correlations averaged about 27. As in previous analyses, the 
correlations for the three predictors are based on somewhat different subsets 
of students. The median correlations for the three predictors were .05 for 
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GREV , .25 for GREQ, and .35 for UGPA. The correlations of UGPA with final 
graduate school GPA were always positive, unlike the correlations involving 
GRE scores, which were negative in 4 out of 14 programs for both GREV and 
GREQ. 

These results show that, in a population of Ph . D .- seeking matriculants 
in Northwestern' s graduate school, conventional measures of verbal and 
quantitative skills cannot discriminate between students who do and do not 
achieve candidacy and graduation. This does not, of course, imply that the 
GRE and UGPA are not useful in admissions: The population of graduate school 
matriculants has already been selected on the basis of GRE scores, UGPA, and 
other factors and those with the least potential for achieving candidacy or 
graduation are likely to have been weeded out. Therefore, the low 
correlations are not unexpected (see Dawes, 1975; Rubin, 1980). (In a 
summary of previously conducted studies of the relation between GRE scores 
and Ph.D. attainment, Willingham, 1974, reported median correlations of .18 
for GREV and .26 for GREQ. These results are not directly comparable to the 
present findings because the 47 correlations on which each median was based 
came from different institutions and corresponded to different administrative 
units.) In the case of the correlations between preadmissions measures and 
final GPA at Northwestern, selection is even more severe, since only those 
who completed graduate school are included in the analysis. The within- 
program means for final GPA ranged from 3.50 to 3.90, with standard 
deviations typically less than .25. 

Within the select population of graduate school matriculants, it is 
likely that personality factors such as perseverance, as well as the 
availability of financial and social support, play a crucial role in 
determining whether graduate school milestones are attained. In a study that 
included a student survey, Girves and Wemmerus (in press) found that 
involvement in the graduation program (e.g., participation in research 
projects, seminars, meetings, and social activities), student relationships 
with faculty, and financial support had a direct or indirect effect on 
progress toward the doctoral degree. There is some evidence that, at the 
undergraduate level, admissions test results and preadmissions grades also 
have little association with persistence toward the degree: Willingham (1985) 
obtained the biserial correlations between a composite of high school rank 
and SAT and persistence to the senior year of college. These correlations 
were found to be very low; in six of the nine colleges studied, they did not 
reach statistical significance . 

These findings suggest that further research on candidacy and graduation 
rates should focus on noncognitive factors. It may be that improvements in 
candidacy and graduation rates can best be achieved by designing admissions 
procedures that place more weight on personality attributes like 
determination or persistence and by improving support systems for students 
already in school. 

Summary 

Several types of analyses were conducted, based on about 2700 Ph.D.- 
seeking students who matriculated in 14 programs at Northwestern University's 
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Graduate School during a 15-year period. Descriptive analyses of students in 
these programs who entered between 1975 and 1986 showed that the percentage 
of foreign students increased from 15 in 1975-1977 to 32 in 1984-1986. The 
percentages of Blacks and Hispanics dropped from about 3 to 1 during this 
time, while the percentage of Asians increased from 1 to 3 . Combined across 
programs, the ratio of men to women remained relatively steady at about 2:1. 

Combined across all 14 graduate programs, the rates of candidacy and 
graduation for students with at least eight years of opportunity to achieve 
these milestones were 59% and 49%, respectively. The rate of graduation, 
given that candidacy had occurred was 83%. There were substantial 
variations in these rates across graduate programs and, to a lesser degree, 
across demographic groups. The highest candidacy and graduation rates were 
in Clinical Psychology and Chemistry; the lowest were in Theatre and Computer 
Science. The rate of candidacy was the same for Whites, Blacks, and foreign 
students, but both types of graduation rates were higher for foreign students 
than for Blacks and Whites. Among Whites, graduation rates were higher for 
men; among foreign students, they were higher for women. The superiority of 
candidacy and graduation rates for Black women over those for Black men must 
be interpreted with caution because of small sample sizes. Interpretation of 
the ethnic group results is complicated by the absence of ethnic codes for 
about 17% of the students in the analysis . 

Survival analyses of candidacy and graduation showed that the Group III 
programs - - English, History, Political Science, Economic, and Philosophy -- 
produced very similar patterns. For candidacy, the survival functions showed 
a steep drop to roughly .50 (corresponding to a candidacy rate of 1 - . 50 - 
.50) in year 4 and then started to level out. For graduation, most Group III 
programs showed sharp drops in their survival curves at about year 5. The 
survival functions leveled out between years 10 and 12 to values between .55 
and .70 (corresponding to graduation rates between .45 and .30). Survival 
functions for Groups I and II showed a great deal of variation. For example, 
the values at which the survival functions for graduation leveled out ranged 
between about .30 for Chemistry and .80 for Theatre. 

Analyses of the relation between measures of academic potential, such as 
GREV, GREQ , and UGPA, with candidacy and graduation showed little 
relationship between preadmission measures and milestone attainment. Most of 
the within-program correlations ranged between -.25 and .25; the medians of 
these correlations across the 14 programs were close to zero for each of the 
six pairs of variables. Evidently, within this select group of students, 
these conventional measures of academic skills cannot discriminate between 
those who do and do not achieve candidacy and graduation. 

The current study does not, of course, provide any information as to 
whether the obtained results may be generalized beyond Northwestern 
University. A multi- institution study is now underway that will involve 
investigation of some of the phenomena examined here, with a particular focus 
on the graduate school careers of minority students. 
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Appendix B - Survival Analysis 

To develop the survival analysis model used here, we start out by 
assuming a piecewise exponential distribution of survival times within each 
graduate program. This implies, for each of K programs, a constant hazard, 
6 ^ (k = 1, 2,...K; i « 1, 2... I), within each of I one-year time intervals. 
Let De an indicator variable such that °^ K j ^ ^ person j in program k 

experiences the event (candidacy or graduation) in the i^ interval; 
otherwise, - 0. Let t-^. be the amount of time person j in program k 



th 



spends in the i interval. Let d.. d.. . be the number of events that 

ik j-1 ikj 

J 

occur in interval i for program k and let t^-j^ t *kj ^ e t ^ e tota l exposure 

time in interval i for members of graduate program k. We want to estimate 

the IK values of 0., . The likelihood for this model is 
ik 

K J I 

WO - n n n e ikj ex P (-* t ) 
k=i j-i i-i 1K llc 11CJ 



K I 



d., 



n n * ik ik exp(-*. k t ) 

k«l i-i 



[1] 



As demonstrated by Laird and Olivier (1981, p. 235) in the case of a 
simpler model, the likelihood obtained by assuming separate piecewise 
exponential distributions within programs is proportional to the likelihood 
that would be obtained under the assumption that each d^ is an independent 
Poisson variate, conditional on t^, with E t i ^) e t ik^ik* T ^at is » 
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l <£) - n n (t ik ^ ik ) d ik ex P (-t ik d ik )/d. k ! 

^ i-1 k-1 

- A A c ik dik / d ik ! A A dik ex p ( - c ik'ik> 

1=1 k=l i«l k=l 

oc L(0) [2] 

Because the likelihood kernels are the same, the two models can be used 
interchangeably for making likelihood-based inferences about the parameters 

*ik- 

The maximum likelihood estimate of $ is simply the occurrence rate for 
program k in interval i, ^ik^ik* * n our aria ^y ses i we used a conventional 

life table approximation for the total exposure time for program k in 

d . , + c . . 

ik ik 

interval i: t., « n., ~ , where n.. is the number of students in 

ik ik 2 ik 



program k who had not yet experienced the event of interest as of the 

: ik 



beginning of the i**^ interval and c is the number of students in program k 



who were censored during the i*"* 1 interval (see Laird 6c Oliver, p. 236). 

A problem with ratios of occurrence to exposure, like d^/z or ^ik^ik' 
is that they tend to be unstable when sample sizes are small. We therefore 
wish to incorporate prior information about the parameters ^ we were 

to remain in the Poisson framework, the next step would be to assume a 
distribution conjugate to the Poisson for the 0 ^. Braun's (1985) approach, 
however, involves transforming the Poisson variates to normal variates and 
then applying empirical Bayes methods that have already been developed for 
the normal case. Let 

,1/2 
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Then, if the Poisson assumption holds, we have approximately 

X - N(m ,S ) 
k ~k ~k 

where ^ - (X^, X^, ... X^) , ^ - (#J/ 2 , 8^, . . . <>\Q ) and S k is a 

diagonal matrix with the diagonal element equal to (^t.^) ^ ' ^ e seconc * 

level of the model assumes that the vectors /i^ are independently generated 
from a multivariate normal distribution, i.e., 

Ai ~ N(/,Z*), k - 1, 2,...K . 

We assume /i and 2 are unknown and must be estimated from the data. 

This model is a special case of the general regression model described 
in Braun, Jones, Rubin, and Thayer (1983). Braun et al, show how the EM 
algorithm (Dempster, Laird, & Rubin, 1977) can be used to obtain maximum 
likelihood estimates of /i^ and S as well as the posterior distributions of 
the 1/^) g^ ven these estimates and the data. The means of these posterior 
distributions provide estimates of the (/i^} . Squaring these estimates in 
turn yields estimates of the I ' 

The estimation procedure for the EB survival analysis developed by Braun 
(1985) differs in two ways from the general regression model of Braun et al . 
(1983). First, the values of Var(X^) are known in the present case and need 
not be re-estimated in the M step of the EM algorithm. Second, to reduce the 



number of parameters to be estimated, a special structure is assumed for S , 

2 



* 2 
as follows: Z — a 



1 p p 

pip 

2 , 
p p 1 



3) 
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That is, the correlations between the square roots of the hazards are assumed 

to be geometrically decreasing. This requires that the computational 

2 

algorithm be modified to obtain maximum likelihood estiiaauet, of a and p 
(Szatrowski, 1976). Based on a preliminary investigation of the robustness 
of the estimation procedure to the assumption of the covariance structure in 
Equation 3, Braun (1985) concluded that the obtained estimates would not be 
expected to vary greatly over a reasonable collection of assumed covariance 
structures. In the current study, the obtained estimates of p were .61 for 
the graduation analysis, .31 for the candidacy analysis, and .60 for the 
analysis of graduation, given candidacy. 

For the piecewise exponential survival model with intervals of length 
A. , the probability of surviving through interval i for an individual in 



graduate program k is estimated by 



S. (l ) - IT exp(-0 A . ) 
k o . . ^ K ik l 
i<i 



[4] 
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This expression is equal to II exp(-0. ,) if A. - 1, i - 1, 2...1, as 

. _ . 1 iC 1 

L<1 

O 

in the present case. The classical survival curves are obtained by setting 

A 

e q ua l to ^^j c / t ^j c ! curves are found by substituting the EB 

estimates of the hazards. 

The differences between the classical and EB estimates were more 
apparent when sample sizes were small, as in survival analyses (not shown) 
that included only White students who were U.S. citizens. In these analyses, 
the classical hazard estimates showed wild fluctuations, whereas the EB 
estimates, which borrow strength from the remaining graduate programs, were 
smoother and better behaved. The EB survival functions were also smoother and 
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closer together than their classical counterparts. Of course, the more 
pleasing appearance of the EB graphs does not, in itself, demonstrate that 
these estimates are superior. However, Braun (1985) presents two types of 
evidence that support the superiority of the EB appproach. First, a cross- 
validation study of the methodology used here was conducted. Data were 
divided in half at random and the EB estimates of a set of survival curves, 
based on a half-sample, were compared with the classical estimates based on 
each of the two half - samples . Each of the EB curves nearly bisected the two 
more variable curves based ot Che classical approach, indicating that the EB 
method successfully borrowed information to provide more stable estimates. 
In a second analysis, Braun investigated the properties of a fully Bayes 
survival analysis method closely related to the present approach. Bayes and 
classical estimates of hazard functions for a truncated data set were 
compared to classical estimates based on the full data set. The Bayes 
estimates for the truncated data were found to reproduce more closely the 
classical estimates based on the full data than did the classical estimates 
based on the truncated data. 
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Appendix C 



An Empirical Bayes Strategy for Logistic Regression 



A simplified EB strategy for logistic regression, which, like Braun's 
(1985) survival analysis approach, takes advantage of existing EB methods for 
the normal case, is as follows: For each of the K graduate programs, obtain 

A 

vectors of regression coefficients B, and their asymptotic covariance 
matrices S from ordinary maximum likelihood logistic regression. Make use 

A 

of the fact that the B are asymptotically normal and treat the S as known. 
Thus , we have 

Now assume a normal prior for the B. : 

~k 

B k ~ N(/, Z*) 

and get E (B^l^, ^ ,E ) , where \x and S are MLES of \i and Z , using the 
EH algorithm. This approach is very similar to that of Korn and Whittemore 
(1979) . A more rigorous EB approach to logistic regression has been 
developed by Wong and Mason (1985). 
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